A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection
نویسندگان
چکیده مقاله:
K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algorithm does not consider the differences between samples, which led the algorithm to have inaccurate predictions. In this paper, we proposed a novel scheme for improving the accuracy of the KNN classification algorithm based on the new weighting technique and stepwise feature selection. First, we used a stepwise feature selection method to eliminate irrelevant features and select highly correlated features with the class category. Then a new weighting method was proposed to give authority value to each sample in train dataset based on neighbor categories and Euclidean distances. This weighting approach gives a higher preference to samples that have neighbors with close Euclidean distance while they are in the same category, which can effectively increase the classification accuracy of the algorithm. We evaluated the accuracy rate of the proposed method and analyzed it with the traditional KNN algorithm and some similar works with the use of five real-world UCI datasets. The experiment results determined that the proposed scheme (denoted by WAD-KNN) performed better than the traditional KNN algorithm and considered approaches with the improvement of approximately 10% accuracy.
منابع مشابه
SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملModeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملdeveloping a pattern based on speech acts and language functions for developing materials for the course “ the study of islamic texts translation”
هدف پژوهش حاضر ارائه ی الگویی بر اساس کنش گفتار و کارکرد زبان برای تدوین مطالب درس "بررسی آثار ترجمه شده ی اسلامی" می باشد. در الگوی جدید، جهت تدوین مطالب بهتر و جذاب تر، بر خلاف کتاب-های موجود، از مدل های سطوح گفتارِ آستین (1962)، گروه بندی عملکردهای گفتارِ سرل (1976) و کارکرد زبانیِ هالیدی (1978) بهره جسته شده است. برای این منظور، 57 آیه ی شریفه، به صورت تصادفی از بخش-های مختلف قرآن انتخاب گردید...
15 صفحه اولsfla based gene selection approach for improving cancer classification accuracy
in this paper, we propose a new gene selection algorithm based on shuffled frog leaping algorithm that is called sfla-fs. the proposed algorithm is used for improving cancer classification accuracy. most of the biological datasets such as cancer datasets have a large number of genes and few samples. however, most of these genes are not usable in some tasks for example in cancer classification. ...
متن کاملمنابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ذخیره در منابع من قبلا به منابع من ذحیره شده{@ msg_add @}
عنوان ژورنال
دوره 12 شماره 4
صفحات 90- 104
تاریخ انتشار 2020-12-01
با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.
کلمات کلیدی
میزبانی شده توسط پلتفرم ابری doprax.com
copyright © 2015-2023